{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "4YErqpfH9jVI"
   },
   "source": [
    "# RAG 评估\n",
    "_作者 [Aymeric Roucher](https://huggingface.co/m-ric)_\n",
    "\n",
    "本 notebook 演示了如何评估你的 RAG（Retrieval Augmented Generation），通过构建一个合成评估数据集并使用 LLM-as-a-judge 来计算你系统的准确性。\n",
    "\n",
    "对于 RAG 系统的介绍，你可以查看[这个技术指南](rag_zephyr_langchain)!\n",
    "\n",
    "RAG 系统很复杂: 这里有一个 RAG 流程图，我们用蓝色标注了系统增强的所有可能性：\n",
    "\n",
    "<img src=\"https://huggingface.co/datasets/huggingface/cookbook-images/resolve/main/RAG_workflow.png\" height=\"700\">\n",
    "\n",
    "实施上述任何改进都可能会带来巨大的性能提升；但如果无法监控对系统性能的影响，那么进行任何更改都是无用的！让我们看看如何评估我们的 RAG 系统。\n",
    "\n",
    "### 评估RAG性能\n",
    "\n",
    "由于有如此多的部分需要调整，这些部分对性能有很大影响，因此对 RAG 系统进行基准测试是至关重要的。\n",
    "\n",
    "对于我们的评估流水线，我们将需要：\n",
    "1. 一个带有问题-答案对的评估数据集（QA 对）\n",
    "2. 一个评估器，用于计算我们的系统在上面的评估数据集上的准确性。\n",
    "\n",
    "➡️ 结果发现，我们可以在整个过程中使用 LLMs 来帮助！\n",
    "1. 评估数据集将由 LLM 🤖 合成生成，并且问题将由其他 LLM 🤖 过滤掉\n",
    "2. 然后，[LLM-as-a-judge](https://huggingface.co/papers/2306.05685) 智能体 🤖 将在这个合成数据集上执行评估。\n",
    "\n",
    "\n",
    "__让我们深入挖掘并开始构建我们的评估流水线！__ 首先，安装所需的模型依赖项。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "bCKBvOcp9jVK"
   },
   "outputs": [],
   "source": [
    "!pip install -q torch transformers transformers langchain sentence-transformers tqdm openpyxl openai pandas datasets langchain-community ragatouille"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "k_lJFbYm9jVL"
   },
   "outputs": [],
   "source": [
    "%reload_ext autoreload\n",
    "%autoreload 2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "oIlNZ1Mn9jVL"
   },
   "outputs": [],
   "source": [
    "from tqdm.auto import tqdm\n",
    "import pandas as pd\n",
    "from typing import Optional, List, Tuple\n",
    "import json\n",
    "import datasets\n",
    "\n",
    "pd.set_option(\"display.max_colwidth\", None)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from huggingface_hub import notebook_login\n",
    "\n",
    "notebook_login()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "zeW8P62J9jVM"
   },
   "source": [
    "### 加载你的知识基础"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "YRbm5tNF9jVM"
   },
   "outputs": [],
   "source": [
    "ds = datasets.load_dataset(\"m-ric/huggingface_doc\", split=\"train\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "wy9CKj0M9jVM"
   },
   "source": [
    "# 1. 为评估构建合成数据集\n",
    "\n",
    "我们首先构建一个问题和相关上下文的综合数据集。方法是先从我们的知识库中获取元素，并让 LLM 根据这些文档生成问题。\n",
    "\n",
    "然后，我们设置其他 LLM 智能体作为生成问答对的质置过滤器：每个智能体将作为一个特定缺陷的过滤器。\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "QkoEgiDg9jVM"
   },
   "source": [
    "### 1.1. 准备源数据文档"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "3gTOlRKO9jVM"
   },
   "outputs": [],
   "source": [
    "from langchain.text_splitter import RecursiveCharacterTextSplitter\n",
    "from langchain.docstore.document import Document as LangchainDocument\n",
    "\n",
    "langchain_docs = [\n",
    "    LangchainDocument(page_content=doc[\"text\"], metadata={\"source\": doc[\"source\"]})\n",
    "    for doc in tqdm(ds)\n",
    "]\n",
    "\n",
    "\n",
    "text_splitter = RecursiveCharacterTextSplitter(\n",
    "    chunk_size=2000,\n",
    "    chunk_overlap=200,\n",
    "    add_start_index=True,\n",
    "    separators=[\"\\n\\n\", \"\\n\", \".\", \" \", \"\"],\n",
    ")\n",
    "\n",
    "docs_processed = []\n",
    "for doc in langchain_docs:\n",
    "    docs_processed += text_splitter.split_documents([doc])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "WjrNhcCh9jVN"
   },
   "source": [
    "### 1.2. 为问题生成设置智能体\n",
    "\n",
    "我们采用 [Mixtral](https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1) 作为问答对的生成，因为他在各个排行榜上表现极佳，比如 [Chatbot Arena](https://huggingface.co/spaces/lmsys/chatbot-arena-leaderboard)。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "GoRySj3Q9jVN"
   },
   "outputs": [],
   "source": [
    "from huggingface_hub import InferenceClient\n",
    "\n",
    "\n",
    "repo_id = \"mistralai/Mixtral-8x7B-Instruct-v0.1\"\n",
    "\n",
    "llm_client = InferenceClient(\n",
    "    model=repo_id,\n",
    "    timeout=120,\n",
    ")\n",
    "\n",
    "\n",
    "def call_llm(inference_client: InferenceClient, prompt: str):\n",
    "    response = inference_client.post(\n",
    "        json={\n",
    "            \"inputs\": prompt,\n",
    "            \"parameters\": {\"max_new_tokens\": 1000},\n",
    "            \"task\": \"text-generation\",\n",
    "        },\n",
    "    )\n",
    "    return json.loads(response.decode())[0][\"generated_text\"]\n",
    "\n",
    "\n",
    "call_llm(llm_client, \"This is a test context\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "hIM_DJRo9jVN"
   },
   "outputs": [],
   "source": [
    "QA_generation_prompt = \"\"\"\n",
    "Your task is to write a factoid question and an answer given a context.\n",
    "Your factoid question should be answerable with a specific, concise piece of factual information from the context.\n",
    "Your factoid question should be formulated in the same style as questions users could ask in a search engine.\n",
    "This means that your factoid question MUST NOT mention something like \"according to the passage\" or \"context\".\n",
    "\n",
    "Provide your answer as follows:\n",
    "\n",
    "Output:::\n",
    "Factoid question: (your factoid question)\n",
    "Answer: (your answer to the factoid question)\n",
    "\n",
    "Now here is the context.\n",
    "\n",
    "Context: {context}\\n\n",
    "Output:::\"\"\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "lVFc-lVy9jVN"
   },
   "source": [
    "现在让我们生成我们的问答对。\n",
    "\n",
    "对于这个例子，我们只生成 10 个问答对，并从 Hub 加载其余的。\n",
    "\n",
    "但是对于你的特定知识库，考虑到你想要获得至少约 100 个测试样本，并且考虑到我们稍后会用我们的批判智能体过滤掉大约一半的样本，你应该生成更多的样本，超过 200 个。\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "8fteqDDD9jVN"
   },
   "outputs": [],
   "source": [
    "import random\n",
    "\n",
    "N_GENERATIONS = 10  # We intentionally generate only 10 QA couples here for cost and time considerations\n",
    "\n",
    "print(f\"Generating {N_GENERATIONS} QA couples...\")\n",
    "\n",
    "outputs = []\n",
    "for sampled_context in tqdm(random.sample(docs_processed, N_GENERATIONS)):\n",
    "    # Generate QA couple\n",
    "    output_QA_couple = call_llm(\n",
    "        llm_client, QA_generation_prompt.format(context=sampled_context.page_content)\n",
    "    )\n",
    "    try:\n",
    "        question = output_QA_couple.split(\"Factoid question: \")[-1].split(\"Answer: \")[0]\n",
    "        answer = output_QA_couple.split(\"Answer: \")[-1]\n",
    "        assert len(answer) < 300, \"Answer is too long\"\n",
    "        outputs.append(\n",
    "            {\n",
    "                \"context\": sampled_context.page_content,\n",
    "                \"question\": question,\n",
    "                \"answer\": answer,\n",
    "                \"source_doc\": sampled_context.metadata[\"source\"],\n",
    "            }\n",
    "        )\n",
    "    except:\n",
    "        continue"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "aUlOUDv59jVN",
    "outputId": "c9634fdb-2a7f-43a6-c4eb-e60b166b8238"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>context</th>\n",
       "      <th>question</th>\n",
       "      <th>answer</th>\n",
       "      <th>source_doc</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>!--Copyright 2023 The HuggingFace Team. All rights reserved.\\n\\nLicensed under the Apache License, Version 2.0 (the \"License\"); you may not use this file except in compliance with\\nthe License. You may obtain a copy of the License at\\n\\nhttp://www.apache.org/licenses/LICENSE-2.0\\n\\nUnless required by applicable law or agreed to in writing, software distributed under the License is distributed on\\nan \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the\\nspecific language governing permissions and limitations under the License.\\n--&gt;\\n\\n# Schedulers\\n\\n🤗 Diffusers provides many scheduler functions for the diffusion process. A scheduler takes a model's output (the sample which the diffusion process is iterating on) and a timestep to return a denoised sample. The timestep is important because it dictates where in the diffusion process the step is; data is generated by iterating forward *n* timesteps and inference occurs by propagating backward through the timesteps. Based on the timestep, a scheduler may be *discrete* in which case the timestep is an `int` or *continuous* in which case the timestep is a `float`.\\n\\nDepending on the context, a scheduler defines how to iteratively add noise to an image or how to update a sample based on a model's output:\\n\\n- during *training*, a scheduler adds noise (there are different algorithms for how to add noise) to a sample to train a diffusion model\\n- during *inference*, a scheduler defines how to update a sample based on a pretrained model's output\\n\\nMany schedulers are implemented from the [k-diffusion](https://github.com/crowsonkb/k-diffusion) library by [Katherine Crowson](https://github.com/crowsonkb/), and they're also widely used in A1111. To help you map the schedulers from k-diffusion and A1111 to the schedulers in 🤗 Diffusers, take a look at the table below:\\n\\n| A1111/k-diffusion    | 🤗 Diffusers                         | Usage                                                                                                         |\\n|---------------------|-------------------------------------|---------------------------------------------------------------------------------------------------------------|\\n| DPM++ 2M            | [`DPMSolverMultistepScheduler`]     |                                                                                                               |\\n| DPM++ 2M Karras     | [`DPMSolverMultistepScheduler`]     | init with `use_karras_sigmas=True`                                                                            |\\n| DPM++ 2M SDE        | [`DPMSolverMultistepScheduler`]     | init with `algorithm_type=\"sde-dpmsolver++\"`                                                                  |\\n| DPM++ 2M SDE Karras | [`DPMSolverMultistepScheduler`]     | init with `use_karras_sigmas=True` and `algorithm_type=\"sde-dpmsolver++\"`                                     |\\n| DPM++ 2S a          | N/A                                 | very similar to  `DPMSolverSinglestepScheduler`                         |\\n| DPM++ 2S a Karras   | N/A                                 | very similar to  `DPMSolverSinglestepScheduler(use_karras_sigmas=True, ...)` |\\n| DPM++ SDE           | [`DPMSolverSinglestepScheduler`]    |                                                                                                               |\\n| DPM++ SDE Karras    | [`DPMSolverSinglestepScheduler`]    | init with `use_karras_sigmas=True`                                                                            |\\n| DPM2                | [`KDPM2DiscreteScheduler`]          |                                                                                                               |\\n| DPM2 Karras         | [`KDPM2DiscreteScheduler`]          | init with `use_karras_sigmas=True`                                                                            |\\n| DPM2 a              | [`KDPM2AncestralDiscreteScheduler`] |                                                                                                               |\\n| DPM2 a Karras       | [`KDPM2AncestralDiscreteScheduler`] | init with `use_karras_sigmas=True`                                                                            |\\n| DPM adaptive        | N/A                                 |                                                                                                               |\\n| DPM fast            | N/A                                 |                                                                                                               |\\n| Euler               | [`EulerDiscreteScheduler`]          |                                                                                                               |\\n| Euler a             | [`EulerAncestralDiscreteScheduler`] |                                                                                                               |\\n| Heun                | [`HeunDiscreteScheduler`]           |                                                                                                               |\\n| LMS                 | [`LMSDiscreteScheduler`]            |                                                                                                               |\\n| LMS Karras          | [`LMSDiscreteScheduler`]            | init with `use_karras_sigmas=True`                                                                            |\\n| N/A                 | [`DEISMultistepScheduler`]          |                                                                                                               |\\n| N/A                 | [`UniPCMultistepScheduler`]         |                                                                                                               |\\n\\nAll schedulers are built from the base [`SchedulerMixin`] class which implements low level utilities shared by all schedulers.\\n\\n## SchedulerMixin\\n[[autodoc]] SchedulerMixin\\n\\n## SchedulerOutput\\n[[autodoc]] schedulers.scheduling_utils.SchedulerOutput\\n\\n## KarrasDiffusionSchedulers\\n\\n[`KarrasDiffusionSchedulers`] are a broad generalization of schedulers in 🤗 Diffusers. The schedulers in this class are distinguished at a high level by their noise sampling strategy, the type of network and scaling, the training strategy, and how the loss is weighed.\\n\\nThe different schedulers in this class, depending on the ordinary differential equations (ODE) solver type, fall into the above taxonomy and provide a good abstraction for the design of the main schedulers implemented in 🤗 Diffusers. The schedulers in this class are given [here](https://github.com/huggingface/diffusers/blob/a69754bb879ed55b9b6dc9dd0b3cf4fa4124c765/src/diffusers/schedulers/scheduling_utils.py#L32).\\n\\n## PushToHubMixin\\n\\n[[autodoc]] utils.PushToHubMixin\\n</td>\n",
       "      <td>What is the class of schedulers in 🤗 Diffusers that are distinguished by their noise sampling strategy, type of network and scaling, training strategy, and loss weighing?\\n</td>\n",
       "      <td>[`KarrasDiffusionSchedulers`]</td>\n",
       "      <td>huggingface/diffusers/blob/main/docs/source/en/api/schedulers/overview.md</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          context  \\\n",
       "0  !--Copyright 2023 The HuggingFace Team. All rights reserved.\\n\\nLicensed under the Apache License, Version 2.0 (the \"License\"); you may not use this file except in compliance with\\nthe License. You may obtain a copy of the License at\\n\\nhttp://www.apache.org/licenses/LICENSE-2.0\\n\\nUnless required by applicable law or agreed to in writing, software distributed under the License is distributed on\\nan \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the\\nspecific language governing permissions and limitations under the License.\\n-->\\n\\n# Schedulers\\n\\n🤗 Diffusers provides many scheduler functions for the diffusion process. A scheduler takes a model's output (the sample which the diffusion process is iterating on) and a timestep to return a denoised sample. The timestep is important because it dictates where in the diffusion process the step is; data is generated by iterating forward *n* timesteps and inference occurs by propagating backward through the timesteps. Based on the timestep, a scheduler may be *discrete* in which case the timestep is an `int` or *continuous* in which case the timestep is a `float`.\\n\\nDepending on the context, a scheduler defines how to iteratively add noise to an image or how to update a sample based on a model's output:\\n\\n- during *training*, a scheduler adds noise (there are different algorithms for how to add noise) to a sample to train a diffusion model\\n- during *inference*, a scheduler defines how to update a sample based on a pretrained model's output\\n\\nMany schedulers are implemented from the [k-diffusion](https://github.com/crowsonkb/k-diffusion) library by [Katherine Crowson](https://github.com/crowsonkb/), and they're also widely used in A1111. To help you map the schedulers from k-diffusion and A1111 to the schedulers in 🤗 Diffusers, take a look at the table below:\\n\\n| A1111/k-diffusion    | 🤗 Diffusers                         | Usage                                                                                                         |\\n|---------------------|-------------------------------------|---------------------------------------------------------------------------------------------------------------|\\n| DPM++ 2M            | [`DPMSolverMultistepScheduler`]     |                                                                                                               |\\n| DPM++ 2M Karras     | [`DPMSolverMultistepScheduler`]     | init with `use_karras_sigmas=True`                                                                            |\\n| DPM++ 2M SDE        | [`DPMSolverMultistepScheduler`]     | init with `algorithm_type=\"sde-dpmsolver++\"`                                                                  |\\n| DPM++ 2M SDE Karras | [`DPMSolverMultistepScheduler`]     | init with `use_karras_sigmas=True` and `algorithm_type=\"sde-dpmsolver++\"`                                     |\\n| DPM++ 2S a          | N/A                                 | very similar to  `DPMSolverSinglestepScheduler`                         |\\n| DPM++ 2S a Karras   | N/A                                 | very similar to  `DPMSolverSinglestepScheduler(use_karras_sigmas=True, ...)` |\\n| DPM++ SDE           | [`DPMSolverSinglestepScheduler`]    |                                                                                                               |\\n| DPM++ SDE Karras    | [`DPMSolverSinglestepScheduler`]    | init with `use_karras_sigmas=True`                                                                            |\\n| DPM2                | [`KDPM2DiscreteScheduler`]          |                                                                                                               |\\n| DPM2 Karras         | [`KDPM2DiscreteScheduler`]          | init with `use_karras_sigmas=True`                                                                            |\\n| DPM2 a              | [`KDPM2AncestralDiscreteScheduler`] |                                                                                                               |\\n| DPM2 a Karras       | [`KDPM2AncestralDiscreteScheduler`] | init with `use_karras_sigmas=True`                                                                            |\\n| DPM adaptive        | N/A                                 |                                                                                                               |\\n| DPM fast            | N/A                                 |                                                                                                               |\\n| Euler               | [`EulerDiscreteScheduler`]          |                                                                                                               |\\n| Euler a             | [`EulerAncestralDiscreteScheduler`] |                                                                                                               |\\n| Heun                | [`HeunDiscreteScheduler`]           |                                                                                                               |\\n| LMS                 | [`LMSDiscreteScheduler`]            |                                                                                                               |\\n| LMS Karras          | [`LMSDiscreteScheduler`]            | init with `use_karras_sigmas=True`                                                                            |\\n| N/A                 | [`DEISMultistepScheduler`]          |                                                                                                               |\\n| N/A                 | [`UniPCMultistepScheduler`]         |                                                                                                               |\\n\\nAll schedulers are built from the base [`SchedulerMixin`] class which implements low level utilities shared by all schedulers.\\n\\n## SchedulerMixin\\n[[autodoc]] SchedulerMixin\\n\\n## SchedulerOutput\\n[[autodoc]] schedulers.scheduling_utils.SchedulerOutput\\n\\n## KarrasDiffusionSchedulers\\n\\n[`KarrasDiffusionSchedulers`] are a broad generalization of schedulers in 🤗 Diffusers. The schedulers in this class are distinguished at a high level by their noise sampling strategy, the type of network and scaling, the training strategy, and how the loss is weighed.\\n\\nThe different schedulers in this class, depending on the ordinary differential equations (ODE) solver type, fall into the above taxonomy and provide a good abstraction for the design of the main schedulers implemented in 🤗 Diffusers. The schedulers in this class are given [here](https://github.com/huggingface/diffusers/blob/a69754bb879ed55b9b6dc9dd0b3cf4fa4124c765/src/diffusers/schedulers/scheduling_utils.py#L32).\\n\\n## PushToHubMixin\\n\\n[[autodoc]] utils.PushToHubMixin\\n   \n",
       "\n",
       "                                                                                                                                                                       question  \\\n",
       "0  What is the class of schedulers in 🤗 Diffusers that are distinguished by their noise sampling strategy, type of network and scaling, training strategy, and loss weighing?\\n   \n",
       "\n",
       "                          answer  \\\n",
       "0  [`KarrasDiffusionSchedulers`]   \n",
       "\n",
       "                                                                  source_doc  \n",
       "0  huggingface/diffusers/blob/main/docs/source/en/api/schedulers/overview.md  "
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "display(pd.DataFrame(outputs).head(1))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "0KG4dNtg9jVN"
   },
   "source": [
    "### 1.3. 设置批判智能体\n",
    "\n",
    "之前的智能体生成的问题可能存在许多缺陷：在验证这些问题之前，我们应该进行质量检查。\n",
    "\n",
    "因此，我们构建了批判智能体，它们将根据以下几个标准对每个问题进行评分，这些标准在[这篇论文](https://huggingface.co/papers/2312.10003)中给出：\n",
    "- **具体性（Groundedness）**：问题是否可以从给定的上下文中得到回答？\n",
    "- **相关性（Relevance）**：问题对用户是否相关？例如，`\"transformers 4.29.1 发布的日期是什么？\"`对于 ML 用户来说并不相关。\n",
    "\n",
    "我们注意到的一个最后的失败案例是，当一个函数是为生成问题的特定环境量身定做的，但本身难以理解，比如`\"这个指南中使用的函数的名称是什么？\"`。 \n",
    "我们也为这个标准构建了一个批判智能体：\n",
    "- **独立（Stand-alone）**：对于一个具有领域知识/互联网访问权限的人来说，问题在没有任何上下文的情况下是否可以理解？与此相反的是，对于从特定博客文章生成的问题比如\"这篇文章中使用的函数是什么？\"\n",
    "\n",
    "我们系统地用所有这些智能体对函数进行评分，每当任何一个智能体的分数太低时，我们就从我们的评估数据集中删除这个问题。\n",
    "\n",
    "💡 ___当要求智能体输出分数时，我们首先要求它们产生其理由。这将帮助我们验证分数，但最重要的是，要求它首先输出理由给了模型更多的 token 来思考和详细阐述答案，然后再将其总结成一个单一的分数 token。___\n",
    "\n",
    "我们现在构建并运行这些批判智能体。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "05aSgTGs9jVO"
   },
   "outputs": [],
   "source": [
    "question_groundedness_critique_prompt = \"\"\"\n",
    "You will be given a context and a question.\n",
    "Your task is to provide a 'total rating' scoring how well one can answer the given question unambiguously with the given context.\n",
    "Give your answer on a scale of 1 to 5, where 1 means that the question is not answerable at all given the context, and 5 means that the question is clearly and unambiguously answerable with the context.\n",
    "\n",
    "Provide your answer as follows:\n",
    "\n",
    "Answer:::\n",
    "Evaluation: (your rationale for the rating, as a text)\n",
    "Total rating: (your rating, as a number between 1 and 5)\n",
    "\n",
    "You MUST provide values for 'Evaluation:' and 'Total rating:' in your answer.\n",
    "\n",
    "Now here are the question and context.\n",
    "\n",
    "Question: {question}\\n\n",
    "Context: {context}\\n\n",
    "Answer::: \"\"\"\n",
    "\n",
    "question_relevance_critique_prompt = \"\"\"\n",
    "You will be given a question.\n",
    "Your task is to provide a 'total rating' representing how useful this question can be to machine learning developers building NLP applications with the Hugging Face ecosystem.\n",
    "Give your answer on a scale of 1 to 5, where 1 means that the question is not useful at all, and 5 means that the question is extremely useful.\n",
    "\n",
    "Provide your answer as follows:\n",
    "\n",
    "Answer:::\n",
    "Evaluation: (your rationale for the rating, as a text)\n",
    "Total rating: (your rating, as a number between 1 and 5)\n",
    "\n",
    "You MUST provide values for 'Evaluation:' and 'Total rating:' in your answer.\n",
    "\n",
    "Now here is the question.\n",
    "\n",
    "Question: {question}\\n\n",
    "Answer::: \"\"\"\n",
    "\n",
    "question_standalone_critique_prompt = \"\"\"\n",
    "You will be given a question.\n",
    "Your task is to provide a 'total rating' representing how context-independant this question is.\n",
    "Give your answer on a scale of 1 to 5, where 1 means that the question depends on additional information to be understood, and 5 means that the question makes sense by itself.\n",
    "For instance, if the question refers to a particular setting, like 'in the context' or 'in the document', the rating must be 1.\n",
    "The questions can contain obscure technical nouns or acronyms like Gradio, Hub, Hugging Face or Space and still be a 5: it must simply be clear to an operator with access to documentation what the question is about.\n",
    "\n",
    "For instance, \"What is the name of the checkpoint from which the ViT model is imported?\" should receive a 1, since there is an implicit mention of a context, thus the question is not independant from the context.\n",
    "\n",
    "Provide your answer as follows:\n",
    "\n",
    "Answer:::\n",
    "Evaluation: (your rationale for the rating, as a text)\n",
    "Total rating: (your rating, as a number between 1 and 5)\n",
    "\n",
    "You MUST provide values for 'Evaluation:' and 'Total rating:' in your answer.\n",
    "\n",
    "Now here is the question.\n",
    "\n",
    "Question: {question}\\n\n",
    "Answer::: \"\"\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "b9tbk7ME9jVO"
   },
   "outputs": [],
   "source": [
    "print(\"Generating critique for each QA couple...\")\n",
    "for output in tqdm(outputs):\n",
    "    evaluations = {\n",
    "        \"groundedness\": call_llm(\n",
    "            llm_client,\n",
    "            question_groundedness_critique_prompt.format(\n",
    "                context=output[\"context\"], question=output[\"question\"]\n",
    "            ),\n",
    "        ),\n",
    "        \"relevance\": call_llm(\n",
    "            llm_client,\n",
    "            question_relevance_critique_prompt.format(question=output[\"question\"]),\n",
    "        ),\n",
    "        \"standalone\": call_llm(\n",
    "            llm_client,\n",
    "            question_standalone_critique_prompt.format(question=output[\"question\"]),\n",
    "        ),\n",
    "    }\n",
    "    try:\n",
    "        for criterion, evaluation in evaluations.items():\n",
    "            score, eval = (\n",
    "                int(evaluation.split(\"Total rating: \")[-1].strip()),\n",
    "                evaluation.split(\"Total rating: \")[-2].split(\"Evaluation: \")[1],\n",
    "            )\n",
    "            output.update(\n",
    "                {\n",
    "                    f\"{criterion}_score\": score,\n",
    "                    f\"{criterion}_eval\": eval,\n",
    "                }\n",
    "            )\n",
    "    except Exception as e:\n",
    "        continue"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "IQv36Y_f9jVO"
   },
   "source": [
    "现在让我们基于我们批判智能体的分数过滤掉不好的问题："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "oBWuOu1b9jVO",
    "outputId": "b32bacea-52f8-486a-96fe-5c188605c5a2"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Evaluation dataset before filtering:\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>question</th>\n",
       "      <th>answer</th>\n",
       "      <th>groundedness_score</th>\n",
       "      <th>relevance_score</th>\n",
       "      <th>standalone_score</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>What is the class of schedulers in 🤗 Diffusers that are distinguished by their noise sampling strategy, type of network and scaling, training strategy, and loss weighing?\\n</td>\n",
       "      <td>[`KarrasDiffusionSchedulers`]</td>\n",
       "      <td>3.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>4.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>What are some utility functions provided by the Hugging Face library for pipelines?\\n</td>\n",
       "      <td>The Hugging Face library provides several utility functions for pipelines, including `ArgumentHandler`, `ZeroShotClassificationArgumentHandler`, `QuestionAnsweringArgumentHandler` for argument handling, `PipelineDataFormat`, `CsvPipelineDataFormat`, `JsonPipelineDataFormat`, `PipedPipelineDataFormat` for data format, and `PipelineException` for exceptions.</td>\n",
       "      <td>5.0</td>\n",
       "      <td>4.0</td>\n",
       "      <td>5.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>What is the default name used in the Gradio demo if no name is provided?\\n</td>\n",
       "      <td>User\\n\\nExplanation: The factoid question asks for the default name used in the Gradio demo if no name is provided. The answer to this question can be found in the `argparse.ArgumentParser()` function, where a default value of \"User\" is set for the `--name` argument.</td>\n",
       "      <td>5.0</td>\n",
       "      <td>3.0</td>\n",
       "      <td>5.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>What is the function used to load a pre-trained Resnet-18 model in the provided context?\\n</td>\n",
       "      <td>The function used to load a pre-trained Resnet-18 model in the provided context is `torch.hub.load('pytorch/vision:v0.6.0', 'resnet18', pretrained=True).eval()`.</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>What is the name of the component used for creating a button in the given code?\\n</td>\n",
       "      <td>The name of the component used for creating a button in the given code is `BaseButton`.</td>\n",
       "      <td>5.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>5.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>What is the command to get the example ONNX file for Bart model?\\n</td>\n",
       "      <td>The command is `python run_onnx_exporter.py --model_name_or_path facebook/bart-base`.</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>What will be covered in the next unit of the course?\\n</td>\n",
       "      <td>The next unit of the course will cover learning more about Unity MLAgents and training agents in Unity environments. It will also prepare students for AI vs AI challenges where they will train their agents to compete against other agents in a snowball fight and a soccer game.</td>\n",
       "      <td>5.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>5.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>What is the purpose of the `negative_original_size`, `negative_crops_coords_top_left`, and `negative_target_size` parameters in SDXL?\\n</td>\n",
       "      <td>These parameters allow SDXL to negatively condition the model on image resolution and cropping parameters.</td>\n",
       "      <td>2.0</td>\n",
       "      <td>4.0</td>\n",
       "      <td>2.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>How are transformers models tested in the Hugging Face repository?\\n</td>\n",
       "      <td>Transformers models are tested in the Hugging Face repository using two test suites: `tests` for the general API and `examples` for various applications that aren't part of the API. These tests are run on CircleCI and GitHub Actions, with different jobs and configurations for each. The tests can be run in various ways, including running all tests, getting the list of all tests, running a specific test module, and running specific tests by name or keyword expression. Additionally, there are options for running tests in parallel, repeating tests, and running tests on a specific GPU or CPU.</td>\n",
       "      <td>3.0</td>\n",
       "      <td>4.0</td>\n",
       "      <td>4.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>What command is used to create a virtual environment in the given context?\\n</td>\n",
       "      <td>The command used to create a virtual environment in the given context is `python -m venv &lt;env_name&gt;`.</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                                                                                                                                                       question  \\\n",
       "0  What is the class of schedulers in 🤗 Diffusers that are distinguished by their noise sampling strategy, type of network and scaling, training strategy, and loss weighing?\\n   \n",
       "1                                                                                         What are some utility functions provided by the Hugging Face library for pipelines?\\n   \n",
       "2                                                                                                    What is the default name used in the Gradio demo if no name is provided?\\n   \n",
       "3                                                                                    What is the function used to load a pre-trained Resnet-18 model in the provided context?\\n   \n",
       "4                                                                                             What is the name of the component used for creating a button in the given code?\\n   \n",
       "5                                                                                                            What is the command to get the example ONNX file for Bart model?\\n   \n",
       "6                                                                                                                        What will be covered in the next unit of the course?\\n   \n",
       "7                                       What is the purpose of the `negative_original_size`, `negative_crops_coords_top_left`, and `negative_target_size` parameters in SDXL?\\n   \n",
       "8                                                                                                          How are transformers models tested in the Hugging Face repository?\\n   \n",
       "9                                                                                                  What command is used to create a virtual environment in the given context?\\n   \n",
       "\n",
       "                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               answer  \\\n",
       "0                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       [`KarrasDiffusionSchedulers`]   \n",
       "1                                                                                                                                                                                                                                              The Hugging Face library provides several utility functions for pipelines, including `ArgumentHandler`, `ZeroShotClassificationArgumentHandler`, `QuestionAnsweringArgumentHandler` for argument handling, `PipelineDataFormat`, `CsvPipelineDataFormat`, `JsonPipelineDataFormat`, `PipedPipelineDataFormat` for data format, and `PipelineException` for exceptions.   \n",
       "2                                                                                                                                                                                                                                                                                                                                         User\\n\\nExplanation: The factoid question asks for the default name used in the Gradio demo if no name is provided. The answer to this question can be found in the `argparse.ArgumentParser()` function, where a default value of \"User\" is set for the `--name` argument.   \n",
       "3                                                                                                                                                                                                                                                                                                                                                                                                                                                   The function used to load a pre-trained Resnet-18 model in the provided context is `torch.hub.load('pytorch/vision:v0.6.0', 'resnet18', pretrained=True).eval()`.   \n",
       "4                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             The name of the component used for creating a button in the given code is `BaseButton`.   \n",
       "5                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               The command is `python run_onnx_exporter.py --model_name_or_path facebook/bart-base`.   \n",
       "6                                                                                                                                                                                                                                                                                                                                The next unit of the course will cover learning more about Unity MLAgents and training agents in Unity environments. It will also prepare students for AI vs AI challenges where they will train their agents to compete against other agents in a snowball fight and a soccer game.   \n",
       "7                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          These parameters allow SDXL to negatively condition the model on image resolution and cropping parameters.   \n",
       "8  Transformers models are tested in the Hugging Face repository using two test suites: `tests` for the general API and `examples` for various applications that aren't part of the API. These tests are run on CircleCI and GitHub Actions, with different jobs and configurations for each. The tests can be run in various ways, including running all tests, getting the list of all tests, running a specific test module, and running specific tests by name or keyword expression. Additionally, there are options for running tests in parallel, repeating tests, and running tests on a specific GPU or CPU.   \n",
       "9                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               The command used to create a virtual environment in the given context is `python -m venv <env_name>`.   \n",
       "\n",
       "   groundedness_score  relevance_score  standalone_score  \n",
       "0                 3.0              1.0               4.0  \n",
       "1                 5.0              4.0               5.0  \n",
       "2                 5.0              3.0               5.0  \n",
       "3                 NaN              NaN               NaN  \n",
       "4                 5.0              1.0               5.0  \n",
       "5                 NaN              NaN               NaN  \n",
       "6                 5.0              1.0               5.0  \n",
       "7                 2.0              4.0               2.0  \n",
       "8                 3.0              4.0               4.0  \n",
       "9                 NaN              NaN               NaN  "
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "============================================\n",
      "Final evaluation dataset:\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>question</th>\n",
       "      <th>answer</th>\n",
       "      <th>groundedness_score</th>\n",
       "      <th>relevance_score</th>\n",
       "      <th>standalone_score</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>What are some utility functions provided by the Hugging Face library for pipelines?\\n</td>\n",
       "      <td>The Hugging Face library provides several utility functions for pipelines, including `ArgumentHandler`, `ZeroShotClassificationArgumentHandler`, `QuestionAnsweringArgumentHandler` for argument handling, `PipelineDataFormat`, `CsvPipelineDataFormat`, `JsonPipelineDataFormat`, `PipedPipelineDataFormat` for data format, and `PipelineException` for exceptions.</td>\n",
       "      <td>5.0</td>\n",
       "      <td>4.0</td>\n",
       "      <td>5.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                                                                question  \\\n",
       "1  What are some utility functions provided by the Hugging Face library for pipelines?\\n   \n",
       "\n",
       "                                                                                                                                                                                                                                                                                                                                                                   answer  \\\n",
       "1  The Hugging Face library provides several utility functions for pipelines, including `ArgumentHandler`, `ZeroShotClassificationArgumentHandler`, `QuestionAnsweringArgumentHandler` for argument handling, `PipelineDataFormat`, `CsvPipelineDataFormat`, `JsonPipelineDataFormat`, `PipedPipelineDataFormat` for data format, and `PipelineException` for exceptions.   \n",
       "\n",
       "   groundedness_score  relevance_score  standalone_score  \n",
       "1                 5.0              4.0               5.0  "
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "import pandas as pd\n",
    "\n",
    "pd.set_option(\"display.max_colwidth\", None)\n",
    "\n",
    "generated_questions = pd.DataFrame.from_dict(outputs)\n",
    "\n",
    "print(\"Evaluation dataset before filtering:\")\n",
    "display(\n",
    "    generated_questions[\n",
    "        [\n",
    "            \"question\",\n",
    "            \"answer\",\n",
    "            \"groundedness_score\",\n",
    "            \"relevance_score\",\n",
    "            \"standalone_score\",\n",
    "        ]\n",
    "    ]\n",
    ")\n",
    "generated_questions = generated_questions.loc[\n",
    "    (generated_questions[\"groundedness_score\"] >= 4)\n",
    "    & (generated_questions[\"relevance_score\"] >= 4)\n",
    "    & (generated_questions[\"standalone_score\"] >= 4)\n",
    "]\n",
    "print(\"============================================\")\n",
    "print(\"Final evaluation dataset:\")\n",
    "display(\n",
    "    generated_questions[\n",
    "        [\n",
    "            \"question\",\n",
    "            \"answer\",\n",
    "            \"groundedness_score\",\n",
    "            \"relevance_score\",\n",
    "            \"standalone_score\",\n",
    "        ]\n",
    "    ]\n",
    ")\n",
    "\n",
    "eval_dataset = datasets.Dataset.from_pandas(\n",
    "    generated_questions, split=\"train\", preserve_index=False\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "HaOMZyu69jVO"
   },
   "source": [
    "现在我们合成评估数据集已完成！我们可以在这个评估数据集上评估不同的 RAG 系统。\n",
    "\n",
    "我们在这里只生成了少数几个问答对，以减少时间和成本。下面，让我们通过加载一个预先生成的数据集来进行下一部分："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "Q3RRz4W79jVO"
   },
   "outputs": [],
   "source": [
    "eval_dataset = datasets.load_dataset(\"m-ric/huggingface_doc_qa_eval\", split=\"train\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "K5s19uTd9jVO"
   },
   "source": [
    "# 2. 构建我们的 RAG 系统"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "Z-mET8Dy9jVO"
   },
   "source": [
    "### 2.1. 预处理文档来构建我们的向量数据库\n",
    "\n",
    "- 在这一部分，__我们将知识库中的文档分割成更小的片段__：这些将是被检索器选取的片段，然后被阅读器 LLM 作为支持其答案的元素。\n",
    "- 目标是构建语义上相关的片段：不要太小，以免不足以支持答案，也不要太大，以免稀释单个内容。\n",
    "\n",
    "文本分割有许多选项：\n",
    "- 每隔 `n` 个单词/字符分割，但这有可能割裂段落甚至句子\n",
    "- 在 `n` 个单词/字符后分割，但只在句子边界处\n",
    "- **递归分割** 尝试通过树状处理文档来保留更多文档结构，首先在最大单元（章节）上分割，然后递归地在更小单元（段落，句子）上分割。\n",
    "要了解更多关于分块的信息，我建议你阅读由 Greg Kamradt 编写的[不错的教程](https://github.com/FullStackRetrieval-com/RetrievalTutorials/blob/main/5_Levels_Of_Text_Splitting.ipynb) 。\n",
    "\n",
    "[这个 space](https://huggingface.co/spaces/m-ric/chunk_visualizer) 让你可视化不同的分割选项是如何影响你得到的片段的流程。\n",
    "\n",
    "> 在以下内容中，我们使用 Langchain 的 `RecursiveCharacterTextSplitter`。\n",
    "💡 _为了在我们的文本分割器中测量片段长度，我们的长度函数将不是字符的数量，而是 token 化文本中的 token 数量：实际上，对于后续处理 token 的嵌入器来说，以 token 为单位测量长度更为相关，并且在经验上表现更好._\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "H4fhm55Q9jVO"
   },
   "outputs": [],
   "source": [
    "from langchain.docstore.document import Document as LangchainDocument\n",
    "\n",
    "RAW_KNOWLEDGE_BASE = [\n",
    "    LangchainDocument(page_content=doc[\"text\"], metadata={\"source\": doc[\"source\"]})\n",
    "    for doc in tqdm(ds)\n",
    "]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "sz9Jw2_q9jVO"
   },
   "outputs": [],
   "source": [
    "from langchain.text_splitter import RecursiveCharacterTextSplitter\n",
    "from transformers import AutoTokenizer\n",
    "\n",
    "\n",
    "def split_documents(\n",
    "    chunk_size: int,\n",
    "    knowledge_base: List[LangchainDocument],\n",
    "    tokenizer_name: str,\n",
    ") -> List[LangchainDocument]:\n",
    "    \"\"\"\n",
    "    Split documents into chunks of size `chunk_size` characters and return a list of documents.\n",
    "    \"\"\"\n",
    "    text_splitter = RecursiveCharacterTextSplitter.from_huggingface_tokenizer(\n",
    "        AutoTokenizer.from_pretrained(tokenizer_name),\n",
    "        chunk_size=chunk_size,\n",
    "        chunk_overlap=int(chunk_size / 10),\n",
    "        add_start_index=True,\n",
    "        strip_whitespace=True,\n",
    "        separators=[\"\\n\\n\", \"\\n\", \".\", \" \", \"\"],\n",
    "    )\n",
    "\n",
    "    docs_processed = []\n",
    "    for doc in knowledge_base:\n",
    "        docs_processed += text_splitter.split_documents([doc])\n",
    "\n",
    "    # Remove duplicates\n",
    "    unique_texts = {}\n",
    "    docs_processed_unique = []\n",
    "    for doc in docs_processed:\n",
    "        if doc.page_content not in unique_texts:\n",
    "            unique_texts[doc.page_content] = True\n",
    "            docs_processed_unique.append(doc)\n",
    "\n",
    "    return docs_processed_unique"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "QzBYfNG79jVO"
   },
   "source": [
    "### 2.2.  检索器 - 嵌入 🗂️\n",
    "\n",
    "__检索器的作用类似于内部搜索引擎__：给定用户查询，它从你的知识库中返回最相关的文档。\n",
    "\n",
    "> 对于知识库，我们使用 Langchain 向量数据库，因为它提供了一个方便的 [FAISS](https://github.com/facebookresearch/faiss) 索引，并允许我们在整个处理过程中保留文档元数据。\n",
    "\n",
    "🛠️ __包含可选项：__\n",
    "\n",
    "- 调整分块方法：\n",
    "    - 片段(chunks)的大小\n",
    "    - 方法：在不同的分隔符上分割，使用[语义分块](https://python.langchain.com/docs/modules/data_connection/document_transformers/semantic-chunker)...\n",
    "- 更改嵌入模型"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "LqJlIDZR9jVO"
   },
   "outputs": [],
   "source": [
    "from langchain.vectorstores import FAISS\n",
    "from langchain_community.embeddings import HuggingFaceEmbeddings\n",
    "from langchain_community.vectorstores.utils import DistanceStrategy\n",
    "import os\n",
    "\n",
    "\n",
    "def load_embeddings(\n",
    "    langchain_docs: List[LangchainDocument],\n",
    "    chunk_size: int,\n",
    "    embedding_model_name: Optional[str] = \"thenlper/gte-small\",\n",
    ") -> FAISS:\n",
    "    \"\"\"\n",
    "    Creates a FAISS index from the given embedding model and documents. Loads the index directly if it already exists.\n",
    "\n",
    "    Args:\n",
    "        langchain_docs: list of documents\n",
    "        chunk_size: size of the chunks to split the documents into\n",
    "        embedding_model_name: name of the embedding model to use\n",
    "\n",
    "    Returns:\n",
    "        FAISS index\n",
    "    \"\"\"\n",
    "    # load embedding_model\n",
    "    embedding_model = HuggingFaceEmbeddings(\n",
    "        model_name=embedding_model_name,\n",
    "        multi_process=True,\n",
    "        model_kwargs={\"device\": \"cuda\"},\n",
    "        encode_kwargs={\n",
    "            \"normalize_embeddings\": True\n",
    "        },  # set True to compute cosine similarity\n",
    "    )\n",
    "\n",
    "    # Check if embeddings already exist on disk\n",
    "    index_name = (\n",
    "        f\"index_chunk:{chunk_size}_embeddings:{embedding_model_name.replace('/', '~')}\"\n",
    "    )\n",
    "    index_folder_path = f\"./data/indexes/{index_name}/\"\n",
    "    if os.path.isdir(index_folder_path):\n",
    "        return FAISS.load_local(\n",
    "            index_folder_path,\n",
    "            embedding_model,\n",
    "            distance_strategy=DistanceStrategy.COSINE,\n",
    "        )\n",
    "\n",
    "    else:\n",
    "        print(\"Index not found, generating it...\")\n",
    "        docs_processed = split_documents(\n",
    "            chunk_size,\n",
    "            langchain_docs,\n",
    "            embedding_model_name,\n",
    "        )\n",
    "        knowledge_index = FAISS.from_documents(\n",
    "            docs_processed, embedding_model, distance_strategy=DistanceStrategy.COSINE\n",
    "        )\n",
    "        knowledge_index.save_local(index_folder_path)\n",
    "        return knowledge_index"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "b6y1mQJX9jVO"
   },
   "source": [
    "### 2.3. 阅读器 - LLM 💬\n",
    "\n",
    "在这一部分，__LLM 阅读器读取检索到的文档以形成其答案。__\n",
    "\n",
    "🛠️ 为了改善结果，我们尝试了以下选项：\n",
    "- 切换重排序开启或关闭的状态\n",
    "- 更改阅读器模型"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "9PdpuWyP9jVP"
   },
   "outputs": [],
   "source": [
    "RAG_PROMPT_TEMPLATE = \"\"\"\n",
    "<|system|>\n",
    "Using the information contained in the context,\n",
    "give a comprehensive answer to the question.\n",
    "Respond only to the question asked, response should be concise and relevant to the question.\n",
    "Provide the number of the source document when relevant.\n",
    "If the answer cannot be deduced from the context, do not give an answer.</s>\n",
    "<|user|>\n",
    "Context:\n",
    "{context}\n",
    "---\n",
    "Now here is the question you need to answer.\n",
    "\n",
    "Question: {question}\n",
    "</s>\n",
    "<|assistant|>\n",
    "\"\"\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "9SDqenld9jVP"
   },
   "outputs": [],
   "source": [
    "from langchain_community.llms import HuggingFaceHub\n",
    "\n",
    "repo_id = \"HuggingFaceH4/zephyr-7b-beta\"\n",
    "READER_MODEL_NAME = \"zephyr-7b-beta\"\n",
    "HF_API_TOKEN = \"\"\n",
    "\n",
    "READER_LLM = HuggingFaceHub(\n",
    "    repo_id=repo_id,\n",
    "    task=\"text-generation\",\n",
    "    huggingfacehub_api_token=HF_API_TOKEN,\n",
    "    model_kwargs={\n",
    "        \"max_new_tokens\": 512,\n",
    "        \"top_k\": 30,\n",
    "        \"temperature\": 0.1,\n",
    "        \"repetition_penalty\": 1.03,\n",
    "    },\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "QZ62CbcZ9jVP"
   },
   "outputs": [],
   "source": [
    "from ragatouille import RAGPretrainedModel\n",
    "from langchain_core.vectorstores import VectorStore\n",
    "from langchain_core.language_models.llms import LLM\n",
    "\n",
    "\n",
    "def answer_with_rag(\n",
    "    question: str,\n",
    "    llm: LLM,\n",
    "    knowledge_index: VectorStore,\n",
    "    reranker: Optional[RAGPretrainedModel] = None,\n",
    "    num_retrieved_docs: int = 30,\n",
    "    num_docs_final: int = 7,\n",
    ") -> Tuple[str, List[LangchainDocument]]:\n",
    "    \"\"\"Answer a question using RAG with the given knowledge index.\"\"\"\n",
    "    # Gather documents with retriever\n",
    "    relevant_docs = knowledge_index.similarity_search(\n",
    "        query=question, k=num_retrieved_docs\n",
    "    )\n",
    "    relevant_docs = [doc.page_content for doc in relevant_docs]  # keep only the text\n",
    "\n",
    "    # Optionally rerank results\n",
    "    if reranker:\n",
    "        relevant_docs = reranker.rerank(question, relevant_docs, k=num_docs_final)\n",
    "        relevant_docs = [doc[\"content\"] for doc in relevant_docs]\n",
    "\n",
    "    relevant_docs = relevant_docs[:num_docs_final]\n",
    "\n",
    "    # Build the final prompt\n",
    "    context = \"\\nExtracted documents:\\n\"\n",
    "    context += \"\".join(\n",
    "        [f\"Document {str(i)}:::\\n\" + doc for i, doc in enumerate(relevant_docs)]\n",
    "    )\n",
    "\n",
    "    final_prompt = RAG_PROMPT_TEMPLATE.format(question=question, context=context)\n",
    "\n",
    "    # Redact an answer\n",
    "    answer = llm(final_prompt)\n",
    "\n",
    "    return answer, relevant_docs"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "hiygbqfT9jVP"
   },
   "source": [
    "# 3. 对 RAG 系统进行基准测试\n",
    "\n",
    "RAG 系统和评估数据集现在准备好了。最后一步是在这个评估数据集上判断 RAG 系统的输出。\n",
    "为此，__我们设置了一个裁判智能体__。 ⚖️🤖\n",
    "\n",
    "在[不同的 RAG 评估指标](https://docs.ragas.io/en/latest/concepts/metrics/index.html)中，我们选择只关注忠实度，因为这是衡量我们系统性能的最佳的端到端指标。\n",
    "\n",
    "> 我们使用 GPT4 作为评判者，因为它在实际应用中表现良好，但你也可以尝试其他模型，例如 [kaist-ai/prometheus-13b-v1.0](https://huggingface.co/kaist-ai/prometheus-13b-v1.0) 或 [BAAI/JudgeLM-33B-v1.0](https://huggingface.co/BAAI/JudgeLM-33B-v1.0)。\n",
    "\n",
    "💡 _在评估提示中，我们给出了每个指标的详细描述，采用 1-5 分的评分刻度，正如 [Prometheus 的提示模板](https://huggingface.co/kaist-ai/prometheus-13b-v1.0) 所做的那样：这有助于模型精确地确定其指标。如果你给评判 LLM 一个模糊的评分刻度，那么不同示例之间的输出将不够一致。_\n",
    "\n",
    "💡 _再次提示 LLM 在给出最终评分之前先输出其理由，这样它就有更多的 token 来帮助它正式化和详细阐述评判。_"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "VrlMh_ZI9jVP"
   },
   "outputs": [],
   "source": [
    "from langchain_core.language_models import BaseChatModel\n",
    "\n",
    "def run_rag_tests(\n",
    "    eval_dataset: datasets.Dataset,\n",
    "    llm,\n",
    "    knowledge_index: VectorStore,\n",
    "    output_file: str,\n",
    "    reranker: Optional[RAGPretrainedModel] = None,\n",
    "    verbose: Optional[bool] = True,\n",
    "    test_settings: Optional[str] = None,  # To document the test settings used\n",
    "):\n",
    "    \"\"\"Runs RAG tests on the given dataset and saves the results to the given output file.\"\"\"\n",
    "    try:  # load previous generations if they exist\n",
    "        with open(output_file, \"r\") as f:\n",
    "            outputs = json.load(f)\n",
    "    except:\n",
    "        outputs = []\n",
    "\n",
    "    for example in tqdm(eval_dataset):\n",
    "        question = example[\"question\"]\n",
    "        if question in [output[\"question\"] for output in outputs]:\n",
    "            continue\n",
    "\n",
    "        answer, relevant_docs = answer_with_rag(\n",
    "            question, llm, knowledge_index, reranker=reranker\n",
    "        )\n",
    "        if verbose:\n",
    "            print(\"=======================================================\")\n",
    "            print(f\"Question: {question}\")\n",
    "            print(f\"Answer: {answer}\")\n",
    "            print(f'True answer: {example[\"answer\"]}')\n",
    "        result = {\n",
    "            \"question\": question,\n",
    "            \"true_answer\": example[\"answer\"],\n",
    "            \"source_doc\": example[\"source_doc\"],\n",
    "            \"generated_answer\": answer,\n",
    "            \"retrieved_docs\": [doc for doc in relevant_docs],\n",
    "        }\n",
    "        if test_settings:\n",
    "            result[\"test_settings\"] = test_settings\n",
    "        outputs.append(result)\n",
    "\n",
    "        with open(output_file, \"w\") as f:\n",
    "            json.dump(outputs, f)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "Ae-3KWzK9jVP"
   },
   "outputs": [],
   "source": [
    "EVALUATION_PROMPT = \"\"\"###Task Description:\n",
    "An instruction (might include an Input inside it), a response to evaluate, a reference answer that gets a score of 5, and a score rubric representing a evaluation criteria are given.\n",
    "1. Write a detailed feedback that assess the quality of the response strictly based on the given score rubric, not evaluating in general.\n",
    "2. After writing a feedback, write a score that is an integer between 1 and 5. You should refer to the score rubric.\n",
    "3. The output format should look as follows: \\\"Feedback: {{write a feedback for criteria}} [RESULT] {{an integer number between 1 and 5}}\\\"\n",
    "4. Please do not generate any other opening, closing, and explanations. Be sure to include [RESULT] in your output.\n",
    "\n",
    "###The instruction to evaluate:\n",
    "{instruction}\n",
    "\n",
    "###Response to evaluate:\n",
    "{response}\n",
    "\n",
    "###Reference Answer (Score 5):\n",
    "{reference_answer}\n",
    "\n",
    "###Score Rubrics:\n",
    "[Is the response correct, accurate, and factual based on the reference answer?]\n",
    "Score 1: The response is completely incorrect, inaccurate, and/or not factual.\n",
    "Score 2: The response is mostly incorrect, inaccurate, and/or not factual.\n",
    "Score 3: The response is somewhat correct, accurate, and/or factual.\n",
    "Score 4: The response is mostly correct, accurate, and factual.\n",
    "Score 5: The response is completely correct, accurate, and factual.\n",
    "\n",
    "###Feedback:\"\"\"\n",
    "\n",
    "from langchain.prompts.chat import (\n",
    "    ChatPromptTemplate,\n",
    "    HumanMessagePromptTemplate,\n",
    ")\n",
    "from langchain.schema import SystemMessage\n",
    "\n",
    "\n",
    "evaluation_prompt_template = ChatPromptTemplate.from_messages(\n",
    "    [\n",
    "        SystemMessage(content=\"You are a fair evaluator language model.\"),\n",
    "        HumanMessagePromptTemplate.from_template(EVALUATION_PROMPT),\n",
    "    ]\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "ia9Mvn859jVP"
   },
   "outputs": [],
   "source": [
    "from langchain.chat_models import ChatOpenAI\n",
    "\n",
    "OPENAI_API_KEY = \"\"\n",
    "\n",
    "eval_chat_model = ChatOpenAI(model=\"gpt-4-1106-preview\", temperature=0, openai_api_key=OPENAI_API_KEY)\n",
    "evaluator_name = \"GPT4\"\n",
    "\n",
    "\n",
    "def evaluate_answers(\n",
    "    answer_path: str,\n",
    "    eval_chat_model,\n",
    "    evaluator_name: str,\n",
    "    evaluation_prompt_template: ChatPromptTemplate,\n",
    ") -> None:\n",
    "    \"\"\"Evaluates generated answers. Modifies the given answer file in place for better checkpointing.\"\"\"\n",
    "    answers = []\n",
    "    if os.path.isfile(answer_path):  # load previous generations if they exist\n",
    "        answers = json.load(open(answer_path, \"r\"))\n",
    "\n",
    "    for experiment in tqdm(answers):\n",
    "        if f\"eval_score_{evaluator_name}\" in experiment:\n",
    "            continue\n",
    "\n",
    "        eval_prompt = evaluation_prompt_template.format_messages(\n",
    "            instruction=experiment[\"question\"],\n",
    "            response=experiment[\"generated_answer\"],\n",
    "            reference_answer=experiment[\"true_answer\"],\n",
    "        )\n",
    "        eval_result = eval_chat_model.invoke(eval_prompt)\n",
    "        feedback, score = [\n",
    "            item.strip() for item in eval_result.content.split(\"[RESULT]\")\n",
    "        ]\n",
    "        experiment[f\"eval_score_{evaluator_name}\"] = score\n",
    "        experiment[f\"eval_feedback_{evaluator_name}\"] = feedback\n",
    "\n",
    "        with open(answer_path, \"w\") as f:\n",
    "            json.dump(answers, f)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "EXH-szLe9jVP"
   },
   "source": [
    "🚀 让我们运行一下测试和评估一下答案!👇"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "jW2nnvUT9jVQ"
   },
   "outputs": [],
   "source": [
    "if not os.path.exists(\"./output\"):\n",
    "    os.mkdir(\"./output\")\n",
    "\n",
    "for chunk_size in [200]:  # Add other chunk sizes (in tokens) as needed\n",
    "    for embeddings in [\"thenlper/gte-small\"]:  # Add other embeddings as needed\n",
    "        for rerank in [True, False]:\n",
    "            settings_name = f\"chunk:{chunk_size}_embeddings:{embeddings.replace('/', '~')}_rerank:{rerank}_reader-model:{READER_MODEL_NAME}\"\n",
    "            output_file_name = f\"./output/rag_{settings_name}.json\"\n",
    "\n",
    "            print(f\"Running evaluation for {settings_name}:\")\n",
    "\n",
    "            print(\"Loading knowledge base embeddings...\")\n",
    "            knowledge_index = load_embeddings(\n",
    "                RAW_KNOWLEDGE_BASE,\n",
    "                chunk_size=chunk_size,\n",
    "                embedding_model_name=embeddings,\n",
    "            )\n",
    "\n",
    "            print(\"Running RAG...\")\n",
    "            reranker = (\n",
    "                RAGPretrainedModel.from_pretrained(\"colbert-ir/colbertv2.0\")\n",
    "                if rerank\n",
    "                else None\n",
    "            )\n",
    "            run_rag_tests(\n",
    "                eval_dataset=eval_dataset,\n",
    "                llm=READER_LLM,\n",
    "                knowledge_index=knowledge_index,\n",
    "                output_file=output_file_name,\n",
    "                reranker=reranker,\n",
    "                verbose=False,\n",
    "                test_settings=settings_name,\n",
    "            )\n",
    "\n",
    "            print(\"Running evaluation...\")\n",
    "            evaluate_answers(\n",
    "                output_file_name,\n",
    "                eval_chat_model,\n",
    "                evaluator_name,\n",
    "                evaluation_prompt_template,\n",
    "            )"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "tytXV5-h9jVT"
   },
   "source": [
    "### 检查结果"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "D4YDSfmr9jVT"
   },
   "outputs": [],
   "source": [
    "import glob\n",
    "\n",
    "outputs = []\n",
    "for file in glob.glob(\"./output/*.json\"):\n",
    "    output = pd.DataFrame(json.load(open(file, \"r\")))\n",
    "    output[\"settings\"] = file\n",
    "    outputs.append(output)\n",
    "result = pd.concat(outputs)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "CdkXMNvS9jVT"
   },
   "outputs": [],
   "source": [
    "result[\"eval_score_GPT4\"] = result[\"eval_score_GPT4\"].apply(\n",
    "    lambda x: int(x) if isinstance(x, str) else 1\n",
    ")\n",
    "result[\"eval_score_GPT4\"] = (result[\"eval_score_GPT4\"] - 1) / 4"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "lgxBpid29jVT",
    "outputId": "9a3bcf32-4b0c-4df1-c76c-3ebbca82929d"
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "settings\n",
       "./output/rag_chunk:200_embeddings:thenlper~gte-small_rerank:False_reader-model:zephyr-7b-beta.json       0.884328\n",
       "./output/rag_chunk:200_embeddings:BAAI~bge-base-en-v1.5_rerank:False_reader-model:zephyr-7b-beta.json    0.906716\n",
       "./output/rag_chunk:200_embeddings:BAAI~bge-base-en-v1.5_rerank:True_reader-model:zephyr-7b-beta.json     0.906716\n",
       "./output/rag_chunk:200_embeddings:thenlper~gte-small_rerank:True_reader-model:mixtral.json               0.906716\n",
       "./output/rag_chunk:200_embeddings:thenlper~gte-small_rerank:True_reader-model:zephyr-7b-beta.json        0.921642\n",
       "./output/rag_chunk:200_embeddings:thenlper~gte-small_rerank:True_reader-model:mixtral0.json              0.947761\n",
       "Name: eval_score_GPT4, dtype: float64"
      ]
     },
     "execution_count": 24,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "average_scores = result.groupby(\"settings\")[\"eval_score_GPT4\"].mean()\n",
    "average_scores.sort_values()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "pSPH9DYI9jVT"
   },
   "source": [
    "## 结果示例\n",
    "\n",
    "让我们加载通过调整这个 notebook 中可用的不同选项所获得的结果。关于这些选项为何有效或无效的更多细节，请参阅 [高级 RAG](advanced_rag) 的 notebook。\n",
    "\n",
    "正如在下面的图表中所看到的，一些调整并没有带来任何改善，而有些则带来了巨大的性能提升。\n",
    "\n",
    "➡️ ___所以没有单一的好方法：在调整你的 RAG 系统时，应该尝试几种不同的方向。___\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "RVOxatv99jVT"
   },
   "outputs": [],
   "source": [
    "import plotly.express as px\n",
    "\n",
    "scores = datasets.load_dataset(\"m-ric/rag_scores_cookbook\", split=\"train\")\n",
    "scores = pd.Series(scores[\"score\"], index=scores[\"settings\"])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "vqK0Dg2Q9jVT"
   },
   "outputs": [],
   "source": [
    "fig = px.bar(\n",
    "    scores,\n",
    "    color=scores,\n",
    "    labels={\n",
    "        \"value\": \"Accuracy\",\n",
    "        \"settings\": \"Configuration\",\n",
    "    },\n",
    "    color_continuous_scale=\"bluered\",\n",
    ")\n",
    "fig.update_layout(\n",
    "    width=1000,\n",
    "    height=600,\n",
    "    barmode=\"group\",\n",
    "    yaxis_range=[0, 100],\n",
    "    title=\"<b>Accuracy of different RAG configurations</b>\",\n",
    "    xaxis_title=\"RAG settings\",\n",
    "    font=dict(size=15),\n",
    ")\n",
    "fig.layout.yaxis.ticksuffix = \"%\"\n",
    "fig.update_coloraxes(showscale=False)\n",
    "fig.update_traces(texttemplate=\"%{y:.1f}\", textposition=\"outside\")\n",
    "fig.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "dPUOMWGk9jVT"
   },
   "source": [
    "<img src=\"https://huggingface.co/datasets/huggingface/cookbook-images/resolve/main/RAG_settings_accuracy.png\" height=\"500\" width=\"800\">\n",
    "\n",
    "如上图所示，这些调整对性能的影响各不相同。尤其是调整片段大小，既简单又非常有影响力。\n",
    "\n",
    "但这只是针对我们的情况：你的结果可能大不相同：现在你已经有了一个可靠的评估流水线，可以开始探索其他选项了！🗺️"
   ]
  }
 ],
 "metadata": {
  "colab": {
   "provenance": []
  },
  "kernelspec": {
   "display_name": "ml2",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.9"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}
